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Abstract 

It is well known that the branching process approach to the study 
of the random graph G„, p gives a very simple way of understanding the 
size of the giant component when it is fairly large (of order 0(n)). Here 
we show that a variant of this approach works all the way down to the 
phase transition: we use branching process arguments to give a simple 
new derivation of the asymptotic size of the largest component whenever 
(np — l) 3 n — > oo. 

1 Introduction 

Our aim in this note is to show how basic results about the survival probability of 
branching processes can be used to give an essentially best possible result about 
the emergence of the giant component in G„ iP , the random graph with vertex set 
[n] = {1,2, ... ,n} in which each edge is present independently with probability 
p. In 1959, Erdos and Renyi [4] showed that if we take p — p(n) — c/n where 
c is constant, then there is a 'phase transition' at c = 1. We write L%(G) for 
the maximal number of vertices in a component of a graph G. Also, as usual, 
we say that an event holds with high probability or whp if its probability tends 
to 1 as n — > oo. Erdos and Renyi showed that, whp, if c < 1 then L\(G n>c / n ) 
is of logarithmic order, if c = 1 it is of order n 2//3 , while if c > 1 then there is 
a unique 'giant' component containing 0(n) vertices, while the second largest 
component is much smaller. 

In 1984, Bollobas [1 noticed that this is only the starting point, and an 
interesting question remains: what does the component structure of G njP look 
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like for p = (1 + e)/n, where e = e(n) — > 0? He and Luczak [B] showed that 
if £ = 0(n -1 / 3 ) then G„ jP behaves in a qualitatively similar way to G ni i/ n ; 
this range of p is now called the scaling window or critical window of the phase 
transition. The range e 3 n — » oo is the supercritical regime, characterized by the 
fact that there is whp a unique 'giant' component that is much larger than the 
second largest component. The range e 3 n — > — oo is the subcritical regime. 

In this paper we are interested in the size of the giant component as it 
emerges. Thus we consider the (weakly) supercritical regime where p = p{n) — 
(1 +e)/n, with e = e{n) satisfying 

e — > and e 3 n — > oo as n — > oo. (1) 

Our aim here is to use branching processes to give a very simple new proof of the 
following result, originally due to Bollobas |T] (with a mild extra assumption) 
and Luczak [6]. 

Theorem 1. Under the assumption ([1} we have 

Li(G„, p ) = (2 + Op(l))en. 

Here o p (l) denotes a quantity that tends to in probability: the statement 
is that for any fixed 5 > 0, £i(G„. P ) is in the range (2 ± S)en with probability 
tending to 1 as n — > oo. 

Since the original papers [1] [6] (which in fact gave a more precise bound than 
that above), many different proofs of many forms of Theorem Q] have been given. 
For example, Nachmias and Peres [7] used martingale methods to reprove the 
result as stated here. Pittel and Wormald [5] used counting methods to prove 
an even more precise result; a simpler martingale proof of (part of) their result 
is given in [3] . A proof of Theorem [T] combining tree counting and branching 
process arguments appears in [2]. More recently, Krivelevich and Sudakov [5] 
gave a very simple proof of a variant of Theorem [1] which is even weaker than 
the original Erdos-Renyi result: e > is taken to be constant, and the size of 
the giant component is determined only up to a constant factor. 



2 Branching process preliminaries 

Let us start by recalling some basic concepts and results. The Galton-Watson 
branching process with offspring distribution Z is the random rooted tree con- 
structed as follows: start with a single root vertex in generation 0. Each vertex 
in generation t has a random number of children in generation t + 1, with dis- 
tribution Z. The numbers of children are independent of each other and of 
the history. It is well known and easy to check that if ¥.[Z] > 1, then the 
process survives (is infinite) with probability g the unique solution in (0, 1] to 
I — g = fz{{ — q), where fz is the probability generating function of Z. When 
¥\Z] < 1, the expectation of the total number of vertices in the branching 
process is 

l+E[Z]+E[Z] 2 + --.= 1 (2) 
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and in particular the survival probability is 0. 

Let us write 7~ n ,p for the binomial branching process with parameters n and 
p, i.e., for the branching process as above with offspring distribution Bi(n,p). 
Since the generating function of Bi(n,p) satisfies 

= E (t) pk{1 -p^ kxk = (i - p(i -*))", 

when np > 1 the survival probability g — g njP satisfies 

l-g=(l-pg) n . 
From this it is easy to check that if e = np — 1 — > with e > then 

g ~ 2e. (3) 

Conditioning on a suitable branching process dying out (i.e., having finite 
total size) one obtains another branching process, called the dual branching 
process. In the binomial case, one way to see this is to think of T n , P as a random 
subgraph of the infinite n-ary rooted tree 7~ n ,i obtained by including each edge 
independently with probability p, and retaining only the component of the root. 
For a vertex of 7^,i in generation 1 there are three possibilities: it may (i) be 
absent, i.e., not joined to the root, (ii) survive, i.e., be joined to the root and have 
infinitely many descendents, or (iii) die out. The probabilities of these events are 
1 — p, pg and p{\ — g), respectively. Let T> denote the event that the process T n . p 
dies out, i.e., the total population is finite. Since T> happens if and only if every 
vertex of T n \ in generation 1 is absent or dies out, the conditional distribution of 
p given T> is the unconditional distribution of %i,tt, with tt — p(l — g)/ (1 —pg). 
Thus the dual of Tn.p is Tn.ir- 

Note that when np — 1 = e — > 0, then 

1-pg-np + npg 
1 — mr = ~ npg — [np — 1 ) — pg ~ e. 

I-PQ 

Hence the mean number of offspring in the dual process Tn^ is 1 — (1 + o(l))e, 
and from @ its expected size total is (1 + o(l))e _1 . Writing S = V c for the 
event that T n , P survives (is infinite), and \T n ,p\ for its total size (number of 
vertices), it follows that for any integer L = L(n) we have 

P(|7^,| >L) = P(5) +P(2?)P(|7;, W | > L ) 

< ¥{s)+¥{\r n , n \>L) 

< (l + (l))(2e + l/( e L)), (4) 

with the second inequality following from Markov's inequality. 

We shall use one further property of T n ,p, which can be proved in a number of 
simple ways. Suppose, as above, that e — np— 1 — > 0, and let M = M(n) satisfy 
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eM — > oo. Let w(T) denote the width of a rooted tree T, i.e., the maximum 
(supremum) of the sizes of the generations. Then 

F({w(% lP )>M}nV) =o(e). (5) 

To see this, consider testing whether the event Wm — { w (7n,p) M} holds by 
constructing T n , P generation by generation, stopping at the first (if any) of size 
at least M. If such a generation exists then (since the descendents of each vertex 
in this generation form independent copies of T n , P ), the conditional probability 
that the process dies out is at most (1 — g) M ^ e~ eM — > 0. Hence 

P(£> | W M ) = o(l). (6) 

Thus 

P(%) ~ P(5 n W M ) < P(5) - 2e, 
which with |6]| gives 10). 

3 Application to G nv 

The binomial branching process is intimately connected to the component ex- 
ploration process in G n ^ p . Given a vertex v of G niP , let C v denote the component 
of Gn tP containing v, and let T v be the random tree obtained by exploring this 
component by breadth-first search. In other words, starting with v, find all 
its neighbours, Vi,.. .,vg, say, next find all the neighbours of V\ different from 
the vertices found so far, then the new neighbours of V2, and so on, ending the 
second stage with the new neighbours of vi. The third stage consists of finding 
all the new neighbours of the vertices found in the second stage, and so on. 
Eventually we build a tree T v , which is a spanning tree of C v . 

Note that our notation suppresses the fact that the distributions of T v and 
of C v depend on n and p. In the next lemma, as usual, \H\ denotes the total 
number of vertices in a graph H . 

Lemma 2. (i) For any n and p, the random rooted trees T v and T n ,p may be 
coupled so that 7~ v C T n , P - 

(ii) For any n, k and p there is a coupling of the integer-valued random 
variables \C V \ and \T n -k, P \ so that either \C V \ ^ \T n -k,p\ or both are at least k. 

Proof. For the first statement we simply generate T v and T n , P together, always 
adding fictitious vertices to the vertex set of G n . p for the branching process to 
take from, so that in each step a vertex has n potential new neighbours (some 
fictitious) each of which it is joined to with probability p. All the descendants 
of the fictitious vertices are themselves fictitious. 

To prove (ii) we slightly modify the exploration, to couple a tree contained 
within C v with T n -k, P such that one of two alternatives holds: either D 
%i-k,p, or else both T' u and T n -h, P have at least k vertices. Indeed, construct 
7~v exactly as T v , except that at each step at the start of which we have not 
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yet reached more than k vertices, we test for edges from the current vertex to 
exactly n — k potential new neighbours. Since |C„| |7^'|, this coupling gives 
the result. □ 

From now on we take p = p(n) = (1 + e)/n, where e = e(n) satisfies ([T]). 
We start by using the two couplings described above to give bounds on the ex- 
pected number of vertices in large components. In both lemmas, -/V[l,»i] denotes 
the number of vertices of G n , P in components with between L and n vertices 
(inclusive); P„ jP and E„ iP denote the probability measure and expectation asso- 
ciated to Gn tP . 

Lemma 3. Suppose that L = L(n) — o(en). Then¥ ntP (\C v \ ^ L) ^ (2 + o(l))e. 
Equivalently, M v , tP (N\i J:Tl \) ^ (2 + o(l))en. 

Proof. Taking k = L in Lemma [U[ii) ; 

(\c v \^l) > mr n -L, P \>L) 

^ F(7~ n -L,p survives) ~ 2((n-L)p- 1) ~ 2s, 

where the approximation steps follow from ([3]) and the assumption on L. □ 

Lemma 4. Suppose that L = L(n) satisfies e 2 L — > oo. Then E njP (A f [i „]) ^ 
(2 + o(l))en. 

Proof. By Lemma [3Ji) and (HJ), 

Pn, P (|a| > L) < P(|T„, P | > L) < (1 + o(l))(2e + l/(eL)) - 2e. 

□ 

Together these lemmas show that the expected number of vertices in com- 
ponents of size at least n 2 / 3 , say, is asymptotically 2en. Two tasks remain: to 
establish concentration, and to show that most vertices in large components 
are in a single giant component. For the first task, one can simply count tree 
components. (This is a little messy, but theoretically trivial. The difficulties 
in the original papers [TJ [B] stemmed from the fact that non-tree components 
had to be counted as well. What is surprising is that here it suffices to count 
tree components.) Indeed, applying the first and second moment methods to 
the number N of vertices in tree components of size at most n 2 / 3 /w, where 
uj = Li(n) — > oo sufficiently slowly, shows that this number is within o p (en) of 
(1 — g)n, reproving Lemma|3]and (together with Lemma[3]) giving the required 
concentration. See [2] for a version of this argument with a (best possible) 
O p (y/n/e) error term. Since the calculations, though requiring no ideas, are 
somewhat lengthy, we take a different approach here. 

Lemma 5. Suppose that L = L(n) satisfies e 2 L — > oo and L = o(en). Then 
%,n](G BlP ) = (2 + Op(l))en. 
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Proof. Let N be the number of vertices of G n>p in components of size at least 
L. From Lemmas [3] and [4] the expectation E[N] of N satisfies E[N] ~ 2en, so it 
suffices to show that 

E[N 2 ] < (4 + o(f ))e 2 n 2 . (7) 

Fix a vertex w of G„ :P . Let us reveal a tree 7^,' spanning a subset C' v of 
C,, by exploring using breadth-first search as before, except that we stop the 
exploration if at any point (i) we have reached L vertices in total, or (ii) there 
are eL vertices that have been reached (found as a new neighbour of an earlier 
vertex) but not yet explored (tested for new neighbours). Note that condition 
(ii) may happen partway through revealing a generation of 7J, and indeed part- 
way through revealing the new neighbours of a vertex. We call a vertex reached 
but not (fully) explored a boundary vertex, and note that there are at most 
eL + 1 < 2eL boundary vertices. Let A be the event that we stop for reasons 
(i) or (ii), rather than because we have revealed the whole component: 

A = { the exploration stops due to (i) or (ii) holding }. 

Note that if \C V \ > L, then A holds. 

As before, we may couple Ty with T n , P so that C T n ,p- Since the boundary 
vertices correspond to a set of vertices of T n ,p contained in two consecutive 
generations, if A holds, then either \T n ,p\ ^ L or w(T n , P ) ^ eL/2. From (g]) and 
© it follows that F(A) < (2 + o(l))e. ' 

Since all vertices are equivalent and \C' V \ ^ L implies that A holds, we have 

E[N 2 } = nE[l\c \>lN] < nE[l A N] = nF(A)E[N | A] ^ (2 + o(l))enE[N | A]. 

(8) 

Suppose that A does hold. Given any vertex w £ C' v , we explore from w as 
usual, but within G' = G niP \ V(C' V ), coupling the resulting tree 7Z with T n , P 
so that T^j C 7n,p- Let C' w be the component of w in G", so C' w is spanned by 
7^. Let S be the event that (this final copy of) T n , P is infinite, and let V = S c . 
Note that C' w c C w , and that the two are equal unless there is an edge from 
C' w to some boundary vertex. Since there are at most 2eL boundary vertices, 
this last event has conditional probability at most 2eL\C' w \p < 3eL\C' w \/n, say. 
Since \C' W \ < |7^ lP |, it follows that 

¥{\C W \ >L\A) < P(5) + ¥(V)¥(\C' W \ ^L\V)+ 3P(^)eLn- 1 E[|Cj | V] 
< P(5) + P(|7;,p| >L\V) + teLn^EWTnvl I v \ 
^ P(5) + (L- 1 + 2>eLn- l )E{\T n ,p\ I V], 

by Markov's inequality. Since the final expectation above is ~ e~ l and our 
assumptions give that both L~ x and 3eLn _1 are o(e 2 ), we see that P(|C TO | > 
L | A) ^ (2 + o(l))e. Hence, recalling that there are at most L vertices in C' v , 

E[N \A]^L+{n- L)V{\C W \ > L \ A) L + (2 + o(l))en - 2en. 

Combined with © this gives (0). □ 
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To complete the proof of our main result, it remains only to show that almost 
all vertices in large components are in a single giant component. For this we 
use a simple form of the classical sprinkling argument of Erdos and Renyi [3] . 

Proof of Theorem^ It will be convenient to write e = wn -1 ' 3 , with oj — 
oj(n) — > oo and w = o(n 1 / 3 ). Also, let ui' — > oo slowly, say with w' = o(loglogw). 

Set L — en/uj'. By Lemma[5]there are in total at most (2 + o p (l))en vertices 
in components of size larger than L, which gives the upper bound on Li. 

For the lower bound, set p\ — n~ 4 / 3 , and define po by po + Pi — PoPi = P, 
so that if first we choose the edges with probability po and then (we sprinkle 
some more) with probability p\ then the random graph we get is exactly G„. p . 
Since npo — l = (l + o(l))e, for any S > Lemma [3] shows that with probability 
1 — o(l) the graph G„ jPo has at least (2 — 8)en vertices in components of size at 
least L. 

Let Ui, . . . , Ui be the vertex sets of the components of G n , Po of size at least 
L. The probability that no edge sprinkled with probability p\ joins U\ to Uj is 

(1-^)1^11^1 < e ^ i2 =exp(-n- 4 / 3 W 2 n 4 / 3 /(w') 2 ), 

so the expected number of vertices of U not contained in the component of G ritP 
containing U\ is at most 

l 

5>xp(-(^') 2 )l^l=o(|£/|). 

Consequently, with probability 1 — o(l) all but at most vertices of U are 
contained within a single component of G n . P , in which case Li(G n . P ) ^ (1 ~ 
S)(2 — 5)en. Since 5 > was arbitrary, it follows that Li(G n , P ) ^ (2 — o p (l))en, 
completing the proof. □ 

To conclude, let us remark that although Theorem [T] is a key result about 
the phase transition, as discussed in the introduction it is far from the final 
word on the topic. 
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